To Do (Docs):
Welcome to this splendid, informal and slightly, ok, very snarky, log. If the images don't show up, I put them in doc_images folder, so you should find a copy of them there.
The goal for this week is to work out if I can work with ESRGAN off-the-bat with space images and also process the data. I also need to make sure all packages and dependencies work, and work out how to run an intensive program without it taking all of its memory.
I started with working out how to run a program without it taking the full memory and CPU.
For CPU: https://www.tecmint.com/limit-cpu-usage-of-a-process-in-linux-with-cpulimit-tool/ For time and memory: https://www.tecmint.com/limit-time-and-memory-usage-of-linux-process/ For GPU: Apparently, can't be done. Whyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy............................
First, I went to check Python version - got both Python 2 and 3 up to date.
Then tried to install anaconda. It kept flagging command not found, so I reinstalled anaconda. It didn't work. Turns out, I just needed to refresh the terminal. '.................................'
Ok, I have now installed Pytorch compatible with CUDA 9.0
I'm now installing CUDA 9.0 (hope the reverse order of installation doesn't cause any issues)
I don't like installing CUDA 9.0. I think it has failed to install? Next day me: yep, it failed to install, despite asking my flatmate . This was due to dependency issues from my computer - namely, a lack of libc, libc6, and their developer variations, caused by a difference in the expected and existing versions. Downgrading and upgrading was diffult, if not non-permitted as it required one of the missing libraries to install the missing library. I don't know how to fix my lack-of-GPU issue - help!!
I installed the NoCUDA version of Pytorch as well just in case.
If there are any issues with the graphics drivers, just beware that I might have messed it up while trying to fix the CUDA installation issues.
Ok, getting Keras and Tensorflow:
Done. Installed Keras through anaconda and Tensorflow was installed when Keras was installed as a dependency.
Ok, trying for the kjnkajdfbajnad^{th} time to make CUDA work. So far, I managed to make Nvidia drivers work (yay! I now have a driver for my graphics card!!) by using this https://www.linuxbabe.com/ubuntu/install-nvidia-driver-ubuntu-18-04
Where I used the section: "How to Install Nvidia Driver on Ubuntu 18.04 From the Command Line". NOTE: PLEASE avoid the section of "Install the Latest Version Of Nvidia Drivers via PPA" because it's no longer relevant (Ubuntu has the latest version anyway), and even if you use it, it might not be very stable.
Ok, trying the following configuration from Nvidia from here https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal, as shown:
Did a first test of ESRGAN - this went... well, interestingly. An estimate of the CPU only version shows that when capping the number of images per batch for 4 cores, it will take just over 2 months to run once. Not ideal at all. CUDA was needed.
*......Over the next two weeks........*
Reinstalled Ubuntu to remove dependency issues.
Tried to reinstall CUDA, ran into dependency issues with 'build essential' (like before).
Reinstalled Ubuntu (18.04) again, this time, we wiped the USB, and didn't use Netbootin.
Booted the new operating system, and installed build essentials as the first thing, then CUDA as the second thing.
Everything works so far.
Installed Keras, Pip, Anaconda, virtual environment, Pytorch, TorchVision, ESRGAN.
Took over 5 attempts trying to download the dataset for ESRGAN (Google Drive played up). Make sure to wait until the file is fully downloaded before you try to move the file. The Ubuntu-Firefox combo will let you mess with the files before they're downloaded which can definitely cause issues. I think the issue was fixed by either downloading through Chromium (which makes the download process more obvious as it's not hidden behind the downloads button) or checking that the file was definitely downloaded in Firefox.
Ran ESRGAN for the first time with CUDA. It ran after unzipping the data file and reducing the number of images per batch from 256 to 16, and CPU cores (though I doubt it matters) from 8GB to 4GB. If the reduction step was not introduced, the program would not run as there wouldn't be enough memory to run the code. I also reduced the number of epochs to 5 from 200, but it didn't affect much, as it would have taken 3 1/4 hours per epoch, so I stopped after 1 hour of computation.
Initial parameters (the image shows the original parameters. The circled sections are the parameters that have been changed):
GPU and CPU readings:
This is a bit ridiculous. So, attempt 2.
After some discussion, it was found that the main reason is because the number of training images were very large (over 100,000). As a result, a smaller training sample size is need - the data we already have is perfect for this.
Used image from here, and used gen_training.py from my supervisor, I produced 1000 images. If the link doesn't work, the image looks like this: